[model] use self attn in megatron for gated attn by zhuzilin · Pull Request #624 · THUDM/slime

zhuzilin · 2025-10-29T09:00:03Z

This could enable cp and tp for the gated attention part of Qwen3Next models. And is larged inspired by https://github.com/alibaba/Pai-Megatron-Patch

After this PR, we can use tensor parallel for qwen3next.

zhuzilin added 2 commits October 29, 2025 08:59

[model] use self attn in megatron for gated attn

3999028

bugfix

2663768

zhuzilin marked this pull request as ready for review October 29, 2025 13:43

zhuzilin merged commit 08118ec into main Oct 29, 2025
2 of 4 checks passed

llltttwww pushed a commit to llltttwww/slime that referenced this pull request Nov 30, 2025

[model] use self attn in megatron for gated attn (THUDM#624)

532315a

Yangruipis pushed a commit to rednote-ai/slime that referenced this pull request Feb 28, 2026

[model] use self attn in megatron for gated attn (THUDM#624)

b6f22ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] use self attn in megatron for gated attn#624

[model] use self attn in megatron for gated attn#624
zhuzilin merged 2 commits intomainfrom
feature/qwen3next

zhuzilin commented Oct 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhuzilin commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhuzilin commented Oct 29, 2025 •

edited

Loading